Evaluation of spoken language recognition technology using broadcast speech: performance and challenges

نویسندگان

  • Luis Javier Rodríguez-Fuentes
  • Amparo Varona
  • Mireia Díez
  • Mikel Peñagarikano
  • Germán Bordel
چکیده

Spoken Language Recognition (SLR) technology has remarkably improved in the last years, partly thanks to NIST Language Recognition Evaluations (LRE), which have become standard benchmarks for testing new approaches. NIST evaluations focus on narrow-band conversational telephone speech and deal with some specific target languages. Recent efforts to expand the scope of SLR technology assessment include the Albayzin 2008 and 2010 LRE, which deal with wide-band TV broadcast speech. In this work, a SLR system based on state-of-the-art approaches is developed and evaluated on the Albayzin 2008 and 2010 LRE datasets, looking to identify those conditions that make the task challenging and eventually to guide the design of future evaluations using the same kind of data. We present and analyse system performance under different conditions, regarding: (1) the set of target languages (including details about the confusion of languages with each other) and the amount of data available to estimate models; and (3) the presence of background noise.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

KALAKA-3: a database for the recognition of spoken European languages on YouTube audios

This paper describes the main features of KALAKA-3, a speech database specifically designed for the development and evaluation of language recognition systems. The database provides TV broadcast speech for training, and audio data extracted from YouTube videos for tuning and testing. The database was created to support the Albayzin 2012 Language Recognition Evaluation, which featured two langua...

متن کامل

N-best: the northern- and southern-dutch benchmark evaluation of speech recognition technology

In this paper, we describe N-best 2008, the first Large Vocabulary Speech Recognition (LVCSR) benchmark evaluation held for the Dutch language. Both the accent as spoken in the Netherlands (Northern-Dutch) and in Belgium (Southern-Dutch or Flemish), will be evaluated. The evaluation tasks are broadcast news (BN) and conversational telephone speech (CTS). The N-best evaluation will take place in...

متن کامل

مقایسه روش های طیفی برای شناسایی زبان گفتاری

Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...

متن کامل

Towards High Performance Phonotactic Feature for Spoken Language Recognition

With the demands of globalization, multilingual speech is increasingly common in conversational telephone speech, broadcast news and internet podcasts. Therefore, automatic spoken language recognition has become an important technology in multilingual speech related applications. For example, automatic spoken language recognition has been used as a preprocessing component for spoken language tr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012